MEDB 5505, Module01

2025-01-23

Special note

  • This slide show was created using R.

Topics to be covered

  • What you will learn
    • History of R
    • Installing R
    • Objects in R
    • Anatomy of a small R program
    • Live demonstration
    • Good programming practices
    • Your programming assignment

History of R

This portion of the talk will be found in the file history-of-r.pptx.

Break #1

  • What you have learned
    • History of R
  • What’s coming next
    • Installing R

Installing R (https://cran.r-project.org/)

Screenshot of webpage for installation of R

Installing RStudio (https://rstudio.com/)

Screenshot of main page for RStudio

Installing RStudio (https://rstudio.com/)

Screenshot of products of RStudio

Installing R and R Studio

  • R is required
  • RStudio is strongly recommended
  • Do not delay in getting this software installed
  • Find me if you have ANY problems

“A place for everything, everything in its place”

  • data, for raw/intermediate data files
  • doc, for documentation
  • images for graphs/illustrations
  • results, for program output
  • src, for program code
  • Other folders as needed

Break #2

  • What you have learned
    • Installing R
  • What’s coming next
    • Objects in R

Introduction

This is a very brief introduction to the basic objects in R.

R.version.string
[1] "R version 4.3.0 (2023-04-21 ucrt)"
Sys.Date()
[1] "2024-12-30"

Functions

sqrt(3)
[1] 1.732051
sqrt(1:5)
[1] 1.000000 1.414214 1.732051 2.000000 2.236068
barplot(height=5:1, names=1:5)

Nested functions and pipes

x <- 0.9
y <- asin(sqrt(x))
y
[1] 1.249046
x |>
  sqrt() |>
  asin() -> y
y
[1] 1.249046

Named arguments in functions

qnorm(p=0.99, mean=100, sd=15)
[1] 134.8952
qnorm(0.99, 100, 15)
[1] 134.8952
qnorm(0.99)
[1] 2.326348

Scalars

scalar_example_1 <- 3
scalar_example_1
[1] 3
scalar_example_2 <- "R"
scalar_example_2
[1] "R"
scalar_example_3 <- "3"
scalar_example_3
[1] "3"

Vectors

vector_example_1 <- c(1, 2, 3)
vector_example_1
[1] 1 2 3
vector_example_2 <- c("a", "b", "c")
vector_example_2
[1] "a" "b" "c"
vector_example_3 <- c("a", 2)
vector_example_3
[1] "a" "2"

Naming vectors

my_degrees <- c(
  BA=1977, 
  MS=1978, 
  PhD=1982)
my_degrees
  BA   MS  PhD 
1977 1978 1982 
my_name <- c(
  first_name="Stephen", 
  middle_initial="D", 
  last_name="Simon")
my_name
    first_name middle_initial      last_name 
     "Stephen"            "D"        "Simon" 

Matrices using cbind and rbind functions

matrix_example_1 <- 
  cbind(
    c(1, 2, 3), 
    c(4, 5, 6))
matrix_example_1
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
matrix_example_2 <- 
  rbind(
    c(1, 2, 3), 
    c(4, 5, 6))
matrix_example_2
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Matrices using the matrix function

matrix_example_3 <- 
  matrix(
    c(1, 2, 3, 4, 5, 6), 
    nrow=2, 
    ncol=3, 
    byrow=TRUE)
matrix_example_3
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Lists

list_example_1 <- 
  list(
    scalar_example_1, 
    vector_example_2, 
    matrix_example_3)
list_example_1
[[1]]
[1] 3

[[2]]
[1] "a" "b" "c"

[[3]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Lists using names

list_example_2 <- 
  list(
    name=my_name, 
    degrees=my_degrees, 
    age=64)
list_example_2
$name
    first_name middle_initial      last_name 
     "Stephen"            "D"        "Simon" 

$degrees
  BA   MS  PhD 
1977 1978 1982 

$age
[1] 64

Data frames

data_frame_example_1 <- 
  data.frame(
    vector_example_1, 
    vector_example_2)
data_frame_example_1
  vector_example_1 vector_example_2
1                1                a
2                2                b
3                3                c

Naming data frame columns

data_frame_example_2 <- 
  data.frame(
    c(1, 2, 3), 
    c("a", "b", "c"))
data_frame_example_2
  c.1..2..3. c..a....b....c..
1          1                a
2          2                b
3          3                c
data_frame_example_3 <- 
  data.frame(
    small_numbers=c(1, 2, 3), 
    early_letters =c("a", "b", "c"))
data_frame_example_3
  small_numbers early_letters
1             1             a
2             2             b
3             3             c

Tibbles

library(tidyverse)

tibble_example_1 <- 
  tibble(
    x=c(1, 2, 3),
    y=c("a", "b", "c"))
tibble_example_1
# A tibble: 3 × 2
      x y    
  <dbl> <chr>
1     1 a    
2     2 b    
3     3 c    

Vector or tibble?

sample_vector <- 1:5
sample_vector
[1] 1 2 3 4 5
sample_tibble <- tibble(sample_vector)
sample_tibble
# A tibble: 5 × 1
  sample_vector
          <int>
1             1
2             2
3             3
4             4
5             5

Break #3

  • What you have learned
    • Objects in R
  • What’s coming next
    • Anatomy of a small R program

Anatomy of a small R program, overview

YAML header

---
title: "Illustrating the structure of an R program"
editor: source
format: 
  html:
    embed-resources: true
execute: 
  error: true
---

First comment


This program was written by Steve Simon  and created on 2019-01-28 with a major
revision on 2024-12-27. It is used to illustrate the structure of an R program. 
This program is in the public domain. You can use it any way that you please.

First code chunk

```{r}
#| label: setup
#| message: false
#| warning: false

R.version.string
Sys.Date()
library(tidyverse)
```

Second comment


Read data from the aids-cases text file. This file is described at

https://github.com/pmean/data/blob/main/files/aids-cases.yaml

Second code chunk

```{r}
#| label: read-text-file

aids_cases <- read_csv(
  file="../data/aids-cases.csv",
  col_types="nnn")
glimpse(aids_cases)
```

Third comment


This is a small dataset with only three variables. Now let's draw a line graph.

Third code chunk

```{r}
#| label: line-graph

aids_cases |>
  ggplot() +
    aes(yr, nsw) +
    geom_line()
```

Fourth comment


There is an increasing trend in aids cases in New South Wales over time.

Anatomy of a small program, review

Output, overview

Output, part 1

Output, part 2

Output, part 3

Suggestions for nice looking comments

  • Quarto (and Rmarkdown) use tagged text files
    • Based on Markdown
    • Easy to remember
    • Easy read in its raw form
    • Use any program that edits text files
  • Interface with Pandoc to convert to (and from)
    • Microsoft Word, Powerpoint
    • Html files
    • PDF files

Suggestions for nice looking comments

  • Start line with ## for headlines
  • Start lines with -, +, or * for bulleted lists
    • Indent for sub bullets
  • Surround text with ** for bold
  • Surround text with $ for Greek letters (\(\mu\)) and math symbols (\(\sqrt{2}\))
  • Use [] for hyperlinks

Many more in quarto guide

An example of raw Markdown codes

## Suggestions for nice looking comments

-   Start line with ## for headlines
-   Start lines with -, +, or * for bulleted lists
    -   Indent for sub bullets
-   Surround text with ** for **bold**
-   Surround text with $ for Greek letters ($\mu$) and math symbols ($\sqrt{2}$) 
-   Use [] for hyperlinks

Many more in [quarto guide][ref43]

[ref43]: https://quarto.org/docs/authoring/markdown-basics.html

Break #4

  • What you have learned
    • Anatomy of a small R program
  • What’s coming next
    • Live demonstration

Live demonstration of running R

In this segment, you will see a live demonstration running the program simon-5505-01-template.qmd.

Break #5

  • What you have learned
    • Live demonstration
  • What’s coming next
    • Good programming practices

General requirements for any program

There are standards in six areas:

  • Documentation
  • Graphs
  • Tables
  • Readability
  • Interpretation
  • Conciseness

There may be times when one or two of these standards do not apply. Which standards apply and which don’t should be obvious from the nature of the programming assignment.

Documentation is required!

Documentation should include

  • the name of the author (you!),
  • the creation date,
  • the purpose of your program, and
  • any restrictions on use (your choice).
    • Public domain (no restrictions)
    • Specific restrictions on how others can use your program

Graphs cannot rely on default choices, 1

Always modify your graphs. Do not settle for the default options.

  • Include your name and date on the title of any graph
    • “Steve Simon produced this graph on 2023-09-19.”
  • Avoid the display of unnecessary decimal places on the axes
  • Use comma separators for large numbers
  • Replace category codes with descriptive labels

Graphs cannot rely on default choices, 2

  • Replace short variable names with longer descriptors
    • Include units of measurement, if needed
  • Avoid the gratuitous use of color
    • Unless needed to distinguish between groups
    • Fill boxes and points with white/transparent colors

Tables also need modification

  • Round to two or three significant figures
  • Use comma separators if numbers are >= 1,000
  • Avoid scientific notation (e.g., 1.23E-04)
  • Avoid small p-values (e.g., p=0.000)
    • Change to p<0.001
  • Suppress the printing of unneeded tables
    • Sometimes difficult

Sometimes default tables/graphs acceptable

  • Early assignments may ask for defaults
  • Always round and specify units in your interpretations

Your code must be easy to read

  • Make liberal use of
    • blank lines
    • line breaks
    • indenting
    • vertical lists

Always include an interpretation

  • Use simple evaluative words
    • Young/Elderly
    • Less than half/more than half
    • Almost all/almost none
    • Substantial improvement/roughly comparable
  • Depends on context
    • No penalty for subjective judgments

Conciseness

  • Do not include analyses that were not asked for
  • Avoid displaying excessively large tables
    • This may be difficult for SAS and SPSS

Data dictionary

If you include a data set that you found on your own rather than one that your instructors provided, you must include a data dictionary. The elements of a data dictionary should include:

  • Source
  • Description
  • Copyright
  • Size
  • Variables

Data dictionary: source

  • Where did you find the data
    • Website link
    • Formal reference (if available)

Include a complete URL, except if your data is behind a paywall. If your data is associated with a peer-reviewed publication, provide a formal reference to that publication.

Data dictionary: Description

Provide a few sentences explaining the context of your data. Explain how the data was collected and what it is being used for.

Data dictionary: Size

  • Number of rows (excluding a header row)
  • Number of columns

Data dictionary: Variables

  • Name
  • Label
  • Units of measure

Data dictionary: Variable scale

  • Scale
    • Nominal
    • Ordinal
    • Interval
    • Ratio

Data dictionary: Variable range

  • Range
    • Non-negative (>= 0)
    • Positive (> 0)
    • Upper bound, if any

Data dictionary: Variable type

  • Type
    • Integer
    • Float
    • Character

File details

This file was written by Steve Simon on 2024-12-26. It is in the public domain and you can use it any way you please.

Break #6

  • What you have learned
    • Good programming practices
  • What’s coming next
    • Your programming assignment

tidyverse

Hex sticker for tidyverse

dplyr

Hex sticker for dplyr

ggplot2

Hex sticker for ggplot2

magrittr

Hex sticker for magrittr

readr

Hex sticker for readr

stringr

Hex sticker for stingr

tibble

Hex sticker for tibble

tidyr

Hex sticker for tidyr

Other packages in the tidyverse

  • In the core package
    • forcats
    • purr
  • Outside the core package
    • broom
    • lubridate
    • readxl
    • many others

Load tidyverse quietly

Summary

  • What you have learned
    • History of R
    • Installing R
    • Objects in R
    • Anatomy of a small R program
    • Live demonstration
    • Good programming practices
    • Your programming assignment